People most likely to go missing in Mission district between 10am & 11am on Fridays

Summary of findings

  • Mission district has highest rate of missing persons
  • People most likely to go missing between 10am & 11am
  • People most likely to go missing on fridays

In [1]:
import vincent
import pandas as pd
from vincent import AxisProperties, PropertySet, ValueRef
from vincent import Map

vincent.core.initialize_notebook()


Reading the file


In [2]:
incidents = pd.read_csv('sanfrancisco_incidents_summer_2014.csv')

Changing the column labels of the data set


In [3]:
incidents.columns = ['Id'
                    ,'Category'
                    ,'Description'
                    ,'DayOfWeek'
                    ,'Date'
                    ,'Time'
                    ,'District'
                    ,'Resolution'
                    ,'Address'
                    ,'Longitude'
                    ,'Latitude'
                    ,'Location'
                    ,'PdId']

The date and time of incident are in two separate columns. Combining them into a DateTime column


In [4]:
# the date and time of incident are in two separate columns
# combining them into a date_time column
incidents['DateTime'] = pd.to_datetime(incidents['Date'] + ' ' + incidents['Time'])
date_idx = pd.DatetimeIndex(incidents['DateTime'])
incidents['Date'] = date_idx.date.astype('datetime64')
incidents['Hour'] = date_idx.hour
incidents['Year'] = date_idx.year
incidents['Month'] = date_idx.month
incidents['Weekday'] = date_idx.weekday

Histogram plot of the incidents by category

The plot shows that Larceny/Theft is the most reported crime across all districts, second being Assault and 9th being Missing Person.


In [5]:
count_by_category = pd.DataFrame({'count' : incidents.groupby( ['Category'] ).size()}).reset_index()
count_by_category.sort_values(by='count',ascending= True,inplace=True)
index = count_by_category['Category']

graph = vincent.Bar(count_by_category,columns=['count'], key_on='Category')
graph.legend(title='Category')
graph.axis_titles(x='Category', y='Incident Count')
ax = AxisProperties(
    labels=PropertySet(
        angle=ValueRef(value=270),
        align=ValueRef(value='right')
        )
    )
graph.axes[0].properties = ax
graph.display()
#ax = count_by_category.plot(kind="barh",x='Category', y='count',sort_columns=True)



In [6]:
by_year = incidents.pivot_table('Id'
                                , aggfunc='count'
                                , index='Weekday'
                                , columns='District')

In [7]:
graph = vincent.Line(by_year)
graph.legend(title='District')
graph.axis_titles(x='Weekday', y='Incident Count')

graph.display()


Top incident category in Mission district

Limiting the incidents to Mission District and excluding the incident categories 'LARCENY/THEFT','NON-CRIMINAL','OTHER OFFENSES','WARRANTS'.

This shows that the Missing Person is the third highest incident category in Mission district.


In [8]:
filtered = incidents[incidents['District'] == 'MISSION']
filtered = filtered[~filtered['Category'].isin(['LARCENY/THEFT'
                                               ,'NON-CRIMINAL'
                                               ,'OTHER OFFENSES'
                                               ,'WARRANTS'])]

In [9]:
count_by_category = pd.DataFrame({'count' : filtered.groupby( ['Category'] ).size()}).reset_index()
count_by_category.sort_values(by='count',ascending= True,inplace=True)
index = count_by_category['Category']

graph = vincent.Bar(count_by_category,columns=['count'], key_on='Category')
graph.legend(title='Category')
graph.axis_titles(x='Category', y='Incident Count')
ax = AxisProperties(
    labels=PropertySet(
        angle=ValueRef(value=270),
        align=ValueRef(value='right')
        )
    )
graph.axes[0].properties = ax
graph.display()


The graph result is surprising. "Missing Person" incidents are fourth highest compared to overall SFO

Comparing this trend across other districts

The comparison shows that the trend is clearly something that is unique to Mission District and the number of missing person incidents is rather high compared to the other districts


In [10]:
filter_by_category = 'MISSING PERSON'

In [11]:
filtered = incidents[incidents['Category'] == filter_by_category]
by_hour = filtered.pivot_table('Id'
                                , aggfunc='count'
                                , index='Hour'
                                , columns='District')

graph = vincent.Line(by_hour) #,columns=['count'],key_on='District')
graph.legend(title='District')
graph.axis_titles(x='Hour', y='Incident Count')

graph.display()


Histogram plot of the "Missing Person" incidents across districts

The plot clearly shows that the Missing Person incidents are almost double of the same incidents in Southern district, which has the next highest "Missing Person" rate.


In [12]:
count_by_category = pd.DataFrame({'count' : filtered.groupby( ['District'] ).size()}).reset_index()
count_by_category.sort_values(by='count',ascending= True,inplace=True)
index = count_by_category['District']

graph = vincent.Bar(count_by_category,columns=['count'], key_on='District')
graph.legend(title='District')
graph.axis_titles(x='District', y='Incident Count')
ax = AxisProperties(
    labels=PropertySet(
        angle=ValueRef(value=270),
        align=ValueRef(value='right')
        )
    )
graph.axes[0].properties = ax
graph.display()


Top 4 districts where "Missing Person" cases are relatively highest

In the 4 districts where "assaults" are high, Southern District seems to be the highest. There is a peak post 5pm.


In [13]:
filter_by_districts = ['MISSION','SOUTHERN','PARK','BAYVIEW']

In [14]:
filtered = incidents[incidents['Category'] == filter_by_category]
filtered = filtered[filtered['District'].isin(filter_by_districts)]
by_hour = filtered.pivot_table('Id'
                                , aggfunc='count'
                                , index='Hour'
                                , columns='District')

graph = vincent.Line(by_hour) #,columns=['count'],key_on='District')
graph.legend(title='District')
graph.axis_titles(x='Hour', y='Incident Count')

graph.display()


Missing Person incidents peak at 11 am in Mission district

The plot shows another surprising peak in the Missing Person incidents at 11 am.


In [15]:
filter_by_district = 'MISSION'
filter_by_category = 'MISSING PERSON'
filtered = incidents[incidents['Category'] == filter_by_category]
filtered = filtered[filtered['District'] == filter_by_district]

In [16]:
by_hour = filtered.pivot_table('Id'
                              , aggfunc='count'
                              , index='Hour'
                              , columns='Category')

graph = vincent.Line(by_hour) 
graph.legend(title='Category')
graph.axis_titles(x='Hour', y='Incident Count')

graph.display()


Most missing person cases seem to be around 10 to 12 am with the peak being at 11am


In [17]:
by_hour = filtered.pivot_table('Id'
                                , aggfunc='count'
                                , index='Hour'
                                , columns='District')

graph = vincent.Bar(by_hour) 
graph.legend(title='District')
graph.axis_titles(x='Hour', y='Incident Count')

graph.display()


Missing Persons in Mission district peak on Friday


In [18]:
by_weekday = filtered.pivot_table('Id'
                                , aggfunc='count'
                                , index='Weekday'
                                , columns='District')

graph = vincent.Bar(by_weekday) 
graph.legend(title='District')
graph.axis_titles(x='Weekday', y='Incident Count')

graph.display()



In [19]:
filtered = filtered[filtered['Weekday'] == 4]
by_hour = filtered.pivot_table('Id'
                                , aggfunc='count'
                                , index='Hour'
                                , columns='District')

graph = vincent.Bar(by_hour) 
graph.legend(title='District')
graph.axis_titles(x='Hour', y='Incident Count')

graph.display()


Conclusion

  • Mission district has highest rate of missing persons
  • People most likely to go missing around 10am to 11am
  • People most likely to go missing on fridays

People most likely to go missing in Mission District between 10am & 11am on Fridays


In [ ]: